For bluefish forage index modeling, we are selecting a set of predators that have high diet similarity to bluefish.
Garrison and Link (2000) evaluated similarity of predator diets on the Northeast US shelf to develop foraging guilds. They used NEFSC bottom trawl survey data 1973-1997. We are using diets from 1985-2020 to characterize the prey index. Therefore an additional 20+ years of diet information is available to assess whether predator diet similarity has changed. In addition, identifying the predators with the most similar diets (and foraging habits) to bluefish is useful for this analysis.
Garrison and Link (2000) used hierarchical agglomerative clustering to evaluate groups of species with similar diets. Specifically, the Schoener similarity index (Schoener, 1970) was applied
to assess the dietary overlap, Dij, between predator pairs:
\[ D_{i,j} = 1 – 0.5 (∑ |p_{i,k} – p_{j,k}|) \]
where \(p_{i,k}\) = mean proportional volume of prey type k in predator i and \(p_{i,k}\) = mean proportional volume of prey type k in predator j.
Garrison and Link (2000) used a set of 52 prey categories to characterize diet. This does not correspond to current standardized prey categories, which range from highly aggregated general categories (“gencat”) through analysis categories (“analcat”) and collection categories (“collcat”). Lowest possible taxonomic information is “pynam” and the number of prey groups within each category is:
# object is called `allfh`
load(url("https://github.com/Laurels1/Condition/raw/master/data/allfh.RData"))
gencomlist <- allfh %>%
select(pynam, gencat, analcat, collcat, gencom2, analcom3)
gencomlist %>%
summarise_all(n_distinct)
## pynam gencat analcat collcat gencom2 analcom3
## 1 1373 21 108 327 21 107
It seems we could either reconstruct the categories from Garrison and Link (2000) or use the “analcom” category. Over 100 prey categories may be a bit much. Alternatively, additional detail on prey may help identify which piscivores are closest to bluefish. Scott is using a list of 56 prey categories that may be close enough to the orignal list.
preycats <- read_csv(here("fhdat/prey_categories.csv"))
The mean proportions were presumably taken over the entire time period 1973-1997 for the entire Northeast US shelf (survey area) and did not distinguish between seasons.
We could try to reproduce the original time period, add all years up to recent, do only 1997-recent, or some combination.
Also, we may want to include a different list of predators based on sample size. How many of each predator were observed in the earlier time period vs the later?
predn <- allfh %>%
mutate(yearrange = case_when(year < 1998 ~ "early, 1973-1997",
year > 1997 ~ "recent, 1998-2020",
TRUE ~ as.character(NA))) %>%
select(yearrange, pdcomnam) %>%
group_by(yearrange, pdcomnam) %>%
summarise(count = n()) %>%
pivot_wider(names_from = yearrange, values_from = count)
datatable(predn)
There is a table on they NEFSC shiny app that updates feeding guilds based on 50 predators, including striped bass, so perhaps we can use that instead of re-doing this whole analysis.
dietoverlap <- read_csv(here("datfromshiny/tgmat.2022-02-15.csv"))
This can be input into a cluster analysis:
# follows example here https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
library(dendextend)
d_dietoverlap <- dist(dietoverlap)
guilds <- hclust(d_dietoverlap)
#plot(guilds)
dend <- as.dendrogram(guilds)
dend <- rotate(dend, 1:136)
dend <- color_branches(dend, k=6)
labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
"(",labels(dend),")",
sep = "")
dend <- hang.dendrogram(dend,hang_height=0.1)
# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend,
main = "Clustered NEFSC diet data
(the labels give the predator species/size)",
horiz = TRUE, nodePar = list(cex = .007))
#legend("topleft", legend = iris_species, fill = rainbow_hcl(3))
Another visualization
par(mar = rep(0,4))
circlize_dendrogram(dend)
List of “piscivores” (should be the same as shiny app):
#dend %>% get_nodes_attr("members")
# number of members by node
dend %>% get_nodes_attr("members", id = c(2,44)) #2 is first major node, 44 separates "piscivores"
## [1] 57 36
partition_leaves(dend)[[44]]
## [1] "Striped bass..M(106)" "Sea raven..L(89)"
## [3] "Striped bass..L(105)" "Bluefish..S(37)"
## [5] "Weakfish..M(117)" "Fourspot flounder..L(50)"
## [7] "Northern shortfin squid..M(71)" "Longfin squid..M(63)"
## [9] "Longfin squid..S(64)" "Northern shortfin squid..S(72)"
## [11] "Pollock..L(78)" "White hake..M(120)"
## [13] "Red hake..L(82)" "Silver hake..M(93)"
## [15] "Atlantic halibut..M(17)" "Pollock..XL(81)"
## [17] "White hake..L(119)" "Silver hake..L(92)"
## [19] "Cusk..L(45)" "Goosefish..XL(56)"
## [21] "Bluefish..L(35)" "Goosefish..L(53)"
## [23] "Goosefish..M(54)" "Goosefish..S(55)"
## [25] "Spiny dogfish..M(100)" "Thorny skate..XL(116)"
## [27] "Atlantic halibut..L(16)" "Sea raven..M(90)"
## [29] "Sea raven..S(91)" "Atlantic cod..XL(13)"
## [31] "Spiny dogfish..L(99)" "Spotted hake..M(103)"
## [33] "Summer flounder..M(111)" "Summer flounder..L(110)"
## [35] "Bluefish..M(36)" "Buckler dory..M(38)"
How much difference does clustering method make?
# again directly from https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
hclust_methods <- c("ward.D", "single", "complete", "average", "mcquitty",
"median", "centroid", "ward.D2")
diet_dendlist <- dendlist()
for(i in seq_along(hclust_methods)) {
hc_diet <- hclust(d_dietoverlap, method = hclust_methods[i])
diet_dendlist <- dendlist(diet_dendlist, as.dendrogram(hc_diet))
}
names(diet_dendlist) <- hclust_methods
#diet_dendlist
Correlations between clustering methods
diet_dendlist_cor <- cor.dendlist(diet_dendlist)
#diet_dendlist_cor
corrplot::corrplot(diet_dendlist_cor, "pie", "lower")
Comparison of clusters
par(mfrow = c(4,2))
for(i in 1:8) {
diet_dendlist[[i]] %>% set("branches_k_color", k=6) %>% plot(axes = FALSE, horiz = TRUE)
title(names(diet_dendlist)[i])
}
Compare complete (default) and average
diet_dendlist %>% dendlist(which = c(3,4)) %>% ladderize %>%
untangle(method = "step1side", k_seq = 2:6) %>%
set("branches_k_color", k=6) %>%
tanglegram(faster = TRUE) #
#tanglegram(common_subtrees_color_branches = TRUE)
Compare complete and mcquitty
diet_dendlist %>% dendlist(which = c(3,5)) %>% ladderize %>%
untangle(method = "step1side", k_seq = 2:6) %>%
set("branches_k_color", k=6) %>%
tanglegram(faster = TRUE) #
Compare complete and wardD
diet_dendlist %>% dendlist(which = c(3,1)) %>% ladderize %>%
untangle(method = "step1side", k_seq = 2:6) %>%
set("branches_k_color", k=6) %>%
tanglegram(faster = TRUE) #
See about common nodes between methods, looks better
diet_dendlist_cor2 <- cor.dendlist(diet_dendlist, method = "common")
#iris_dendlist_cor2
corrplot::corrplot(diet_dendlist_cor2, "pie", "lower")
Maybe we go with ward.D which seems most consistent across methods?
d_dietoverlap <- dist(dietoverlap)
guilds <- hclust(d_dietoverlap, method = "ward.D")
#plot(guilds)
dend <- as.dendrogram(guilds)
dend <- rotate(dend, 1:136)
dend <- color_branches(dend, k=6)
labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
"(",labels(dend),")",
sep = "")
dend <- hang.dendrogram(dend,hang_height=0.1)
# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend,
main = "Clustered NEFSC diet data
(the labels give the predator species/size)",
horiz = TRUE, nodePar = list(cex = .007))
List of “piscivores” from ward.D (this adds pelagic feeders I would not consider piscivores):
#dend %>% get_nodes_attr("members")
# number of members by node
dend %>% get_nodes_attr("members", id = c(2,36)) #2 is first major node, 36 separates "piscivores"
## [1] 60 43
partition_leaves(dend)[[36]]
## [1] "Spiny dogfish..M(100)" "Thorny skate..XL(116)"
## [3] "Atlantic cod..L(10)" "Winter skate..XL(130)"
## [5] "Spotted hake..M(103)" "Summer flounder..M(111)"
## [7] "Fourspot flounder..M(51)" "Summer flounder..S(112)"
## [9] "Atlantic herring..L(18)" "Atlantic mackerel..M(22)"
## [11] "Pollock..L(78)" "White hake..M(120)"
## [13] "Atlantic halibut..M(17)" "Pollock..XL(81)"
## [15] "Red hake..L(82)" "Silver hake..M(93)"
## [17] "Silver hake..L(92)" "Bluefish..L(35)"
## [19] "Goosefish..L(53)" "Cusk..L(45)"
## [21] "Goosefish..XL(56)" "White hake..L(119)"
## [23] "Goosefish..M(54)" "Goosefish..S(55)"
## [25] "Summer flounder..L(110)" "Bluefish..M(36)"
## [27] "Buckler dory..M(38)" "Atlantic halibut..L(16)"
## [29] "Sea raven..M(90)" "Sea raven..S(91)"
## [31] "Atlantic cod..XL(13)" "Spiny dogfish..L(99)"
## [33] "Longfin squid..M(63)" "Blackbelly rosefish..M(30)"
## [35] "Longfin squid..S(64)" "Northern shortfin squid..S(72)"
## [37] "Fourspot flounder..L(50)" "Northern shortfin squid..M(71)"
## [39] "Striped bass..M(106)" "Sea raven..L(89)"
## [41] "Striped bass..L(105)" "Bluefish..S(37)"
## [43] "Weakfish..M(117)"
Node 68 in this tree contains all bluefish sizes, perhaps closer to “pelagic piscivores”:
partition_leaves(dend)[[68]]
## [1] "Silver hake..L(92)" "Bluefish..L(35)"
## [3] "Goosefish..L(53)" "Cusk..L(45)"
## [5] "Goosefish..XL(56)" "White hake..L(119)"
## [7] "Goosefish..M(54)" "Goosefish..S(55)"
## [9] "Summer flounder..L(110)" "Bluefish..M(36)"
## [11] "Buckler dory..M(38)" "Atlantic halibut..L(16)"
## [13] "Sea raven..M(90)" "Sea raven..S(91)"
## [15] "Atlantic cod..XL(13)" "Spiny dogfish..L(99)"
## [17] "Longfin squid..M(63)" "Blackbelly rosefish..M(30)"
## [19] "Longfin squid..S(64)" "Northern shortfin squid..S(72)"
## [21] "Fourspot flounder..L(50)" "Northern shortfin squid..M(71)"
## [23] "Striped bass..M(106)" "Sea raven..L(89)"
## [25] "Striped bass..L(105)" "Bluefish..S(37)"
## [27] "Weakfish..M(117)"
Alternatively, break out the “piscivores” from all of the options and see how different they really are.
If we trim the dendogram down we can better see the distinctions within piscivores (not done yet):
Garrison, L., and Link, J. 2000. Dietary guild structure of the fish community in the Northeast United States continental shelf ecosystem. Marine Ecology Progress Series, 202: 231–240. http://www.int-res.com/abstracts/meps/v202/p231-240/ (Accessed 22 October 2018).
Schoener, T. W. 1970. Nonsynchronous Spatial Overlap of Lizards in Patchy Habitats. Ecology, 51: 408–418. https://onlinelibrary.wiley.com/doi/abs/10.2307/1935376 (Accessed 14 February 2022).